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Low power wide area network (LPWAN) technology has expanded and is 
essential in the development of applications for the internet of things (IoT). 
The Sigfox LPWAN network is characterized by its long-range coverage, 
low cost and power consumption. In this article, a set of 5174 values is 
analyzed, containing 1606 null RSSI data, obtained with the Sipy module 
and MicroPython, which provide a coverage map of several points with a 
resolution of 200 meters deployed in Quito—Ecuador. It is evaluated the type 
of distribution to which the set of network measurements is adjusted and an 
optimal 900 MHz propagation model in suburban environments is 
determined from the measurements obtained from the known base station. 
As a result, the lost values of RSSI were predicted using the inverse normal 
distribution method in the original values, observing that they conform to a 
logistic distribution. The data from the base station were subjected to a data 
augmentation algorithm designed in MATLAB, determining that the 
stanford university interim (SUI) model reduces the precision error in the 


trend of the curve by not presenting changes greater than 5 dB, achieving a 
precision of 97% with respect to the fit of the curve of the data. 
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1. INTRODUCTION 

Currently there is a large amount of research related to geopositioning measures for received signal 
strength indicator (RSSI) data analysis with different applications [1], [2], one of them is the internet of 
things (IoT), which has become the concept of the internet of the future. This occurred since it defines a 
system where things in the physical world and sensors within or near them are connected to the internet via a 
wireless network or a fixed internet connection [3], [4]. Thus, the main objective has been to extend the 
benefits of internet connectivity to various other interconnected devices [5], [6]. IoT innovations provide new 
types of services, generating new revenue and market segments [7]. This will lead to a significant increase in 
the demand for network width for a long time [8]. In recent years, there has been an introduction to these 
technologies, causing both industrial organizations and academic associations to become interested in their 
feasibility [9]—-[11]. It is even considered one of the innovation areas that will generate the most capital in the 
future with the appearance of new communication technologies such as low power wide area network 
(LPWAN), designed to cover long distances and allow the transmission of few byte messages [3], [12]. 

Sigfox is a reliable, low-cost, and low-power LPWAN communication network that supports the 
connection of devices used in IoT [13]. It arrived in Ecuador in September 2019 with the aim of achieving a 
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more intelligent and digital country, being managed by the company wireless network development (WDN) 
[14]-[16]. Sigfox works as an operator system, where the sensors that are registered in the system only allow 
capturing the linear quadratic with integral (LQI) and in development mode the RSSI. When evaluating the 
coverage of a network, the RSSI parameter is a fundamental measurement to represent the relative quality of 
the received signal through the power level. In general, the signal is usable if the quality is above 4 at a level 
of 25 to 30% [17]. 

Being a wireless network, LPWAN is prone to places without coverage [17], for which 
measurements were made using a geopositioning prototype, in which information on various parameters of 
the Sigfox network in Quito was collected. Thus, it includes the RSSI, the date and time the measurement 
was realized, the identifier, the horizontal position coordinates (latitude and longitude), the vertical position 
coordinates (altitude) and the LQI that will be used to generate a map coverage around the measurement area. 
In order to explore the behavior of the Sigfox network in this area, the results were processed, several null 
measurements were completed applying the inverse normal distribution method, through a statistical analysis 
of the data obtained it was demonstrated that the applied method has a high accuracy and does not affect the 
trend of the values already obtained. The booststrap data augmentation method was applied, with which new 
values were generated to the known base station, by applying various propagation models to the data set, 
while its behavior was analyzed with the help of the mean square error, concluding the Sigfox network in 
suburban environments of better fit the stanford university interim (SUI) propagation model. 


2. METHOD 

This section indicates the equipment and the methodology applied in the research. The data taken in 
Ecuador covers the Sigfox networks deployed around Quito and the Valle de los Chillos. For the data 
collection, different points were placed in the canton and the geolocation hardware implemented with the 
Sipy development board that applies MicroPython as a programming language was used. It begins by 
modifying the timer variable that allows us to count the frequency of sending data, then a comparison is 
realized between two coordinate points. A fixed one entered by the user as a reference point and the real 
measurement taken by the geopositioning device, in order to generate a distance between the two points. In 
such a way that if the data is within a specific radius, it will be sent to the backend from Sigfox. The data 
arrives in hexadecimal format combining by callback with a Thingspeak data analysis platform service. To 
make the connection between Sigfox and Thingspeak an API is needed that will allow the connection with 
the Sigfox channel through the HTTPS protocol. Figure 1 illustrates the connection architecture between 
Sigfox and the Thingspeak platform. 
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Figure 1. Sigfox and Thingspeak connection architecture 


After saving the data sent by the prototypes, it is combined with the MongoDB database, which 
allows the visualization of each measurement obtained. Pycom's Pytrack expansion module has been 
attached, which has a transmission speed of 600 bps, with a transmission power of 22 dBm and a receiver 
sensitivity of -128 dBm. The module integrates the Quectel L76-L receiver, which allows working 
simultaneously with the GPS, GLONASS, Galileo and QZSS systems, in order to use light, humidity, 
temperature, accelerometer, and GPS sensors [18]. 
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There are two prototypes that send the geopositioning coordinates through the Sigfox network at 
different points detailed in the programming of the Pycom device. One prototype is in development mode, 
which allows obtaining the LQI and RSSI values at the same time, while the other prototype being in trading 
mode it gets only the LQI value. The geolocation prototypes were connected to the Sigfox base stations, 
creating a multi-point coverage map with a resolution of 200 meters between each point, where the device, 
being within a radius of 80 meters from the stop, connects with Sigfox stations and sends latitude, longitude, 
and height coordinates along with RSSI and LQI levels covering the largest amount of territory, acquiring a 
total of 5174 measurements. 


2.1. Probability distribution 

Probability distributions are used throughout the sciences to measure, predict, estimate, and 
determine confidence intervals around estimated values. The probability density function (PDF) provides the 
probability that a random variable falls within a specified range [19] and is useful for normalization [20]. The 
pdf for the normal distribution is given in (1): 


a A 
pdf (x, u, o) = Parris 29. (1) 
Where ø is the standard deviation, u is the mean, and ø describes the variance. 

The normal distribution is an approximation that describes the random distribution of real value that 
clusters around a single average value [21] used to adequately approximate the value of a continuous random 
variable in an ideal situation. Although its theoretical importance is very great, its interest also lies in the 
enormous number of practical applications that its generality allows [22], it is a distribution that cannot be 
used to describe broad distributions because its symmetry about the mean would require negative string 
lengths. One way to avoid this problem is to assume that the logarithm of the length of the string has a 
Gaussian distribution [23]. A lognormal distribution is a continuous probability distribution of a random 
variable in which the logarithm is normally distributed [24]. 

Missing values were evaluated using normal random completion using the inverse normal 
distribution function. For the processing and analysis of the data, MATLAB® R2021a installed on a personal 
computer was used, in which statistical tests were performed to plot histograms and pdf curves of the RSSI 
data. Using an LQI range established from Table 1 of the original data, statistical methods were used to 
determine what type of LQI range the distribution fits. The data collection was performed at two different 
reference points, 4766 measurements in strategic places in Quito and 408 measurements at the University of 
the Armed Forces-ESPE and around the Valle de los Chillos, considering that it is not known in which places 
from Quito are the different base stations. However, the location of the base station that was implemented at 
the University of the Armed forces-ESPE is known, in which the distances from the 408 locations to the 
transmission point were measured. 


Table 1. Classification of data by LQI range 


RSSI 
LQI Mean (dBm) Range (dBm) 
Excellent -89 [-94 — -66] 
Good -107 [-115 > -95] 
Average -120 [-127 — -116] 
Limit -129 [-133 — -128] 


2.2. Propagation models 

Curve fitting examines the relationship between one or more predictors (independent variables) and 
a response variable (dependent variable), with the goal of defining a “best fit” model of the relationship [25]. 
Propagation models are based on the well-known linear trend of losses as a function of the base ten logarithm 
[26], (2): 


a x log (x) +b (2) 
The free space propagation model is used in the prediction of the received power level when there is 
a line of sight between the transmitter and the receiver, it is known as the fundamental path loss model [27]. 


The COST 231 propagation model is a semi-empirical loss prediction model recommended for urban 
scenarios in which the transmitting antennas must be located at a distance greater than the average height of 
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the roofs [28]. It is designed to be used in the frequency band from 500 MHz to 2000 MHz. This model is a 
mixture of the Ikegami [29] and Walfisch and Bertoni [30] models. The basic equations for the path loss in 
decibels (dB) of the free space model and the COST 23 model are given in (3) and (4) respectively. 
L, (dB) = 32.45 + 20 log f + 20 logd (3) 
L,(dB) = 42.6 + 20 log f + 26logd (4) 
Where f is the frequency in MHz, d is the distance between the access points (AP) and the antennas 
in km. The SUI propagation loss model is designed for urban, suburban and rural terrain, assuming a 


transmitting antenna height between 10 and 80 meters [31]. According to IEEE 802.18 this model is suitable 
for WiMAX systems [32]. The formula for losses for the basic SUI model is given by: 


L,(dB) = A + 10y log +S + Aly + ALpn (5) 


Where d is the distance between base and receiver, and A represents the free space path loss: 


A = 20log (=*) (6) 


For which dọ is considered as a reference distance equal to 100m, y represents the loss exponent that is 
defined by: 


y =a-bh += (7) 
b 


ALpn is the correction factor and is defined for category B, ALy, is the frequency correction factor and is 
established by: 


ALn = —10.8 log (~*) (8) 
Alyy = 6log () (9) 


The coefficients a.b and c are associated according to the type of terrain according to Table 2. With 
the set of 408 measurements from Valle de los Chillos, an example was made by increasing to 1000 data, 
using the Bootstrap method and thus modeling the new data found, a comparison was made with the COST 
231, free space and SUI propagation models. Figure 2 illustrates the stages that were performed to take 
measurements, from the connection of the IoT devices to the Sigfox network in such a way that the 
measurements reach the Backend. This allows the data to be sent to Thinkspeak, then to the server Mongo 
DB in such a way that one can obtain a database that allows viewing coverage maps and their analysis. 


Table 2. SUI model parameters 


Factor Urban Suburban Rural 
a 4.6 4.0 3.6 
b 0.0075 0.0065 0.005 
c 12.6 17.1 20 
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Figure 2. Block diagram of the implemented method 


Description and analysis of Sigfox received signal strength indicator dataset by ... (Román A. Lara-Cueva) 


1586 O ISSN: 2302-9285 


3. RESULTS AND DISCUSSION 

A network coverage map based on RSSI data taken around the base stations is available to guarantee 
the service of IoT applications developed with the Sigfox network. Figure 3 indicates the Sigfox network 
coverage map of Quito, Figures 3(a) and (e) represent the perimeter areas of the city of Quito, which means 
less agglomeration of high-rise buildings, therefore it denotes minimum coverage dead zones, causing a better 
adaptation to the implementation of IoT networks since it has Sigfox network coverage. In Figures 3(b) and (d) 
it is observed that the area covered by the network decreases when approaching the downtown area of the 
city, however it still presents good conditions for the application of IoT technologies. On the other hand, 
Figure 3(c) illustrates a large dead zone in the downtown area of the city, because the telecommunications 
regulation and control agency (ARCOTEL) specify that the implementation telecommunication stations must 
be located in a healthy environment, without contamination, and must protect the cultural heritage of the city. 
Therefore, when locating the historic center of the city, there is a shortage of Sigfox network infrastructure. 
In this case, the technology cannot adapt its network coverage with the best conditions and the 
implementation of IoT technologies is difficult. Figure 4 indicates the coverage area of the Valle de los 
Chillos base station. Being in a suburban area, the implementation of Sigfox networks presents better 
conditions, due to the fact that there are larger residential areas, it can be seen that there are no points dead 


around the area where the measurements were taken. 
> 
(c) 


+ 


(a) (© 


Figure 3. Sigfox network coverage map of Quito: (a) north, (b) centre-north, (c) centre, (d) centre-south, and 
(e) south 


Figure 4. Sigfox network coverage map of Valle de los Chillos 
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Table 3 presents statistics of the 5174 measurements, where a comparison was realized using the 
missing data and the null RSSI values completed with the inverse normal distribution method, which allows 
finding missing values with the mean and standard deviation of the data. By means of the kurtosis analysis, 
the degree of outliers with respect to a normal distribution is determined, the original data present a low 
concentration of the measures in comparison with its mean, while when filling in the null values this aspect 
changes. However, in none of the cases is the value very high, which is why it is very similar to a normal 


distribution. 


Table 3. RSSI dataset statistics for the area around Quito with 1606 null values and filled applying the 
inverse normal distribution method 


Statistics RSSI data with 1606 missing values (dBm) RSSI data with filled missing values (dBm) 

Minimum - 133.0000 -133.0000 
Maximum -66.0000 -66.0000 
Median -113.0000 -110.0000 
Mean -111.4198 -110.1320 
Mode -116.0000 -110.0000 
Standard dev. 11.5829 10.2395 

Variance 134.1629 104.8469 
Kurtosis -0.1299 0.1718 


Tables 4 and 5 present the matrix of correlation coefficients and statistical characteristics of the 
RSSI data measured in Quito and Valle de los Chillos after having applied the inverse normal distribution 
method dividing the data with the LQI. It can be observed that the statistical characteristics in each LQI range 
are very small, with which the accuracy of the inverse normal distribution algorithm can be evidenced, the 
value that changes the most in relation to the original data is the variance, which represents the variability of 
a data series with respect to its mean [33]. A small decrease in the dispersion of the data found with respect to 
its mean is denoted, this effect was more evident in the “GOOD” range, decreasing the variance by 12.77%. 


Table 4. Correlation matrix of the RSSI dataset statistics for the area around Quito without applying the 
inverse normal distribution method 
Link quality indicator (LQD 


prastics Excellent (dBm) Good (dBm) Average (dBm) Limit (dBm) 
Minimum -94.0000 -115.0000 -127.0000 -133.0000 
Maximum -66.0000 -95.0000 -116.0000 - 128.0000 
Median -91.0000 - 108.0000 -120.0000 -129.0000 
Mean -89.0574 -107.4328 -120.7632 -129.5022 
Mode -91.0000 -114.0000 -116.0000 -128.0000 
Standard dev. 4.7993 5.6543 3.3392 1.3166 
Variance 23.0333 31.9707 11.1504 1.7335 
Kurtosis 2.5622 -0.7536 -1.0909 -0.8204 


Table 5. Correlation matrix of the RSSI dataset for the area around Quito applying the inverse normal 
distribution method 
Link quality indicator (LQD 


Statistics Excellent (dBm) Good (dBm) Average (dBm) Limit (dBm) 
Minimum -94.0000 -115.0000 -127.0000 -133.0000 
Maximum -66.0000 -95.0000 -116.0000 -128.0000 
Median -91.0000 -108.0000 -120.0000 -129.0000 
Mean -89.0574 -107.2523 -120.7627 -129.5267 
Mode -91.0000 -110.0000 -116.0000 -128.0000 
Standard dev. 4.7993 5.2808 3.3371 1.3057 
Variance 23.0333 27.8865 11.1359 1.7049 
Kurtosis 2.5622 -0.6515 -1.0888 -0.8264 


To highlight the importance of preprocessing in the analysis of SigFox network data, the histogram 
distribution is plotted and PDF curves are drawn for the original data with missing values and the data filled 
in with the distribution method reverse normal. Table 6 lists the correlation matrix of the distribution 
parameters of the data set without missing values for Quito. By applying the three statistical tests through the 
analysis of the standard error, it can be seen that the data is better adjusted to a logistic distribution, obtaining 
a lower value in this metric denotes a more precise estimate of the mean. In all three cases there is a greater 
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error in the abscissa axis (mu), affecting the symmetry in the vertical axis. It can be concluded that the 
logistic distribution is better adapted to the data obtained from radiocommunication systems, which is why 
the different propagation models are considered as modifications of the logarithmic model [34]. Table 7 
presents the distribution correlation matrix for the LQI range parameters of the data set with no missing 
values. By dividing the data by the LQI parameter as indicated in Tables 4 and 5, it can be determined that 
the range that most affects the change in the kurtosis of the data is the “excellent” range, the same one that is 
very close to reaching a value of 3. 


Table 6. Correlation matrix of the distribution parameters of the Quito dataset applying the inverse normal 
distribution method 


Parameters 
Distribution Mean (dBm) Variance Mu Std. Err. Si Std. Err. 
(dBm) (dBm) Mu igma Sigma 
t-Student -110.1320 104.8470 -110.1320 0.1423 10.2395 0.1006 
Extreme value -111.2620 200.5390 -104.8890 0.1630 11.0414 0.1045 
Logistic -110.3230 110.3220 -110.3230 0.1400 5.7908 0.0671 


Table 7. Correlation matrix of the distribution parameters of the LQI applying the inverse normal distribution 


method 
adie . Link quality indicator (LQD 
Distribution Parameters Excellent Good Average Limit 
t-Student Mean (dBm) -89.0574 -107.4330 -120.7630 -129.5020 
Variance (dBm) 23.0333 31.9707 11.1504 1.7335 
Mu (dBm) -89.0574 -107.4330 -120.7630 -129.5020 
Std. Err. Mu 0.2452 0.1355 0.0957 0.0870 
Sigma 4.7993 5.6542 3.3392 1.3166 
Std. Err. Sigma 0.1737 0.0958 0.0677 0.0617 
Extreme value Mean (dBm) -90.0611 -107.8550 -120.7890 -129.4900 
Variance (dBm) 67.7815 56.1976 13.4286 1.8488 
Mu (dBm) -89.3559 -104.4820 -119.1400 -128.8780 
Std. Err. Mu 0.3496 0.1489 0.0865 0.0738 
Sigma 6.4192 5.8450 2.8572 1.0601 
Std. Err. Sigma 0.2089 0.1021 0.0647 0.0568 
Logistic Mean (dBm) -89.8981 -107.5620 -120.6520 -129.4270 
Variance (dBm) 19.2803 31.6734 13.3337 2.0283 
Mu (dBm) -89.8981 -107.5620 -120.6520 -129.4270 
Std. Err. Mu 0.2103 0.0950 0.1029 0.0924 
Sigma 2.4208 3.1028 2.0132 0.7851 
Std. Err. Sigma 0.1078 0.0440 0.0467 0.0420 


Figure 5 indicates the probabilistic distribution of RSSI data, as demonstrated in Figure 5(a) and (c), 
the "excellent" and "average" classification resemble a logistic distribution. However, the greatest 
accumulation of data is found in the “good” range represented by Figure 5(b) having a large amount of data 
influences the presentation of a better fit to the logistic distribution and normal distribution. The highest 
standard error values belong to the distribution of extreme values. Which is more adjusted to the minority of 
data in the “limit” and “excellent” ranges represented by Figures 5(d) and (a) respectively. As indicated in the 
standard error parameters represented in Table 7. Considering that the amount of data analyzed is 5174. 

Figure 6 indicates the probabilistic distribution of the original RSSI data. The filled RSSI data and 
the t-student. Extreme value and logistic statistical tests. It can be seen that the data fit a logistic distribution. 
since a more precise estimate average. 

According to the propagation models. The dimensioning of the coverage area of a network can be 
estimated. Which must vary according to the model applied. In the performed simulations. The distance 
between 12 and 1300 meters was considered. In addition to a height of the transmitting antenna of 25 m. A 
height of receiving antennas of around 2 m. With operating frequency values of 900 MHz. Figure 7(a) indicates 
the dispersion of the measured data set around the Valle de los Chillos base station. While Figure 7(b) shows 
the application of Curve Fitting and a second approximation to the logarithmic defined in (2) the which 
shows the adaptation of logarithmic models to radiocommunication systems. With this procedure an initial 
reference of the propagation model of the measurements was determined Figure 7(c). 
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Figure 5. Probabilistic distribution of RSSI data for LQI: (a) excellent, (b) good, (c) average, and (d) limit 
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Figure 6. Probabilistic distribution of complete RSSI data 


In the graphs the data were represented as a function of the RSSI power and the distance at which 
they were measured with respect to the transmitting antenna. Considering that the (RSSI) can vary due to 
internal obstructions, interference, and decrease due to the effect of reflection, precipitation, diffraction, 
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penetration, and dispersion that promote buildings. Therefore, the characteristics of the propagation models 
will allow to indicate the way in which the Sigfox network is propagated. 
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Figure 7. Scatter plots for: (a) scatter diagram of the measurements obtained in the Valle de los Chillos, (b) 
first and second approximation based on measurements in the Valle de los Chillos, and (c) modelling of 
channel losses 


A currently known non-traditional method is the application of bootstrap data augmentation. 
through which the Valle de los Chillos dataset was increased from 408 to 1000 values. Figures 7(c) and 8(a) 
illustrate the comparison of the three propagation models specifying the relationship of a good. Very 
common. Measure of accuracy and excellent general-purpose error metric for numerical root mean square 
error (RMSE) forecasts of the root. They indicate that the COST 231 and free space models predict the least 
losses. Both in the original data and in the data augmented by the bootstrap data augmentation method that is 
represented in Figure 8(b). However, the acquired data is more adjusted to the SUI propagation model. In 
which a reduction in the precision error in the trend of the curve is noted. Achieving an accuracy of 97% with 
respect to the curve fit of the data. The SUI model is recommended for height characteristics of the 
transmitting antenna at the base station between 10 and 80 meters. And for a frequency range between 0 and 
2000 MHz [35]. For which the study area described in this document adjusts to said parameters. When 
comparing the real data with the SUI model. There were no changes greater than 5 dB. Contrary to the free 
space and COST 231 models. Which presented variations in losses of 20 to 30 dB. One aspect that can be 
noticed in all the propagation models described is that in the course of the first 300 meters they grow 
remarkably because they are proportional to the logarithm of the distance between the transmitter and 
receiver [28]. 
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Figure 8. Scatter plots for: (a) data augmentation application and (b) modelling of channel losses with data 
augmentation 


4. CONCLUSION 

By obtaining the coverage maps of the data set. It can be determined that the Sigfox network 
coverage in Quito has a 67.74% good link quality. This percentage is considered suitable for a user or 
company to develop IoT applications without any problems like network traffic data loss or connection 
failure. The coverage map generated with the database is considered to have good reliability for the 5174 
measurements made. 

Based on the information provided by the propagation models analyzed. The signal power received 
at a given point can be predicted. When implementing a network, the appropriate choice of a propagation 
model is essential since it depends on several parameters of the environment. In these realized tests it is 
determined that the base station analyzed being in a suburban environment is better adapted to the SUI model 
in which it can be seen that the results in the SUI model were more accurate, due to the fact that it has 
differences of less than 5 dB in the first 300 m. From this distance the values are very similar to the real ones. 
in addition, the RMSE parameter indicates a lower value in the SUI model for which it has greater accuracy. 

The data set acquired and analyzed contributes to the design and planning of LPWAN networks in 
suburban spaces to guarantee the quality of service in IoT device deployment applications. Through data 
augmentation it is possible to increase the size of the database. Allowing the development of a map coverage 
with higher resolution and better accuracy. In addition, as further research with the application of artificial 
intelligence (AI) it is possible to generate predictive algorithms based on neural networks to calculate the 
exact location of the Sigfox base stations located in the application area. 
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